Surface Grammatical Analysis For The Extraction Of Terminological Noun Phrases

نویسنده

  • Didier Bourigault
چکیده

LEXTER is a software package for extracting terminology. A corpus of French language texts on any subject field is fed in, and LEXTER produces a list of likely terminological units to be submitted to an expert to be validated. To identify the terminological units, LEXTER takes their form into account and proceeds in two main stages : analysis, parsing. In the first stage, LEXTER uses a base of rules designed to indentify frontier markers in view to analysing the texts and extracting maximal-length noun phrases. In the second stage, LEXTER parses these maximal-length noun phrases to extract subgroups which by virtue of their grammatical structure and their place in the maximal-length noun phrases are likely to be terminological units. In this article, the type of analysis used (surface grammatical analysis) is highlighted, as the methodological approach adopted to adapt the rules (experimental approach). I) Constituting Constituting a terminology of a subject field, that is to say establishing a list of the terminological units that represent the concepts of this field, is an oft-encountered problem. For the Research Development Division of Electricit6 de France (French Electricity Board), this problem arose in the information documentation sector. An automatic indexing system, using different thesauri according to the application, has been operational for three years or more [Monteil 1990]. The terminologists and information scientists need a terminology a terminology extraction tool in order to keep these thesauri up to date in constantly changing fields and to create "ex nihilo" thesauri for new fields. This is the reason why the terminological extracting software, LEXTER, was developed, forming the first link in the chain that goes to make up the thesaurus. A corpus of french-language texts is fed into LEXTER, which gives out a list of likely terminological units, which are then passed on to art expert for validation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Endogeneous Corpus-Based Method for Structural Noun Phrase Disambiguation

In this paper, we describe a method for structural noun phrase disambiguation which mainly relies on the examination of the text corpus under analysis and doesn't need to integrate any domain-dependent lexicoor syntactico-semantic information. This method is implemented in the Terminology Extraction Sotware LEXTER. We first explain why the integration of LEXTER in the LEXTER-K project, which ai...

متن کامل

Use of Articles in Learning English as a Foreign Language: A Study of Iranian English Undergraduates

The significance of error analysis for the learner, the teacher and the researcher is now widely recognized. Earlier studies of error analysis concentrated on intersystematic comparison of the “native language” and the “target language” and drew the required data largely from intuitions and impressionistic observations. This study was conducted on the basis of the following observations: (1) to...

متن کامل

UvT: The UvT Term Extraction System in the Keyphrase Extraction Task

The UvT system is based on a hybrid, linguistic and statistical approach, originally proposed for the recognition of multiword terminological phrases, the C-value method (Frantzi et al., 2000). In the UvT implementation, we use an extended noun phrase rule set and take into consideration orthographic and morphological variation, term abbreviations and acronyms, and basic document structure info...

متن کامل

Paradigmatic Modifiability Statistics for the Extraction of Complex Multi-Word Terms

We here propose a new method which sets apart domain-specific terminology from common non-specific noun phrases. It is based on the observation that terminological multi-word groups reveal a considerably lesser degree of distributional variation than non-specific noun phrases. We define a measure for the observable amount of paradigmatic modifiability of terms and, subsequently, test it on bigr...

متن کامل

Terminology extraction from medical texts in Polish

BACKGROUND Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need informat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992